Data refining engine for high performance analysis system and method

ABSTRACT

Price and product attributes from webpages are analyzed over time to identify price changes specific to products on individual webpages and for products across all webpages as well as to identify longitudinal correlations between price changes and product attributes. Users may search the data and set alerts.

CROSS-REFERENCE TO AND INCORPORATION BY REFERENCE OF RELATEDAPPLICATIONS

This application claims the benefit of and incorporates by referenceU.S. Provisional Patent Application No. 61/675,492, filed on Jul. 25,2012. This application also incorporates by reference co-pending U.S.patent application Ser. No. ______, filed on Jul. 25, 2013, titled,“Adaptive Gathering of Structured and Unstructured Data System andMethod,” which application also claims the benefit of U.S. ProvisionalPatent Application No. 61/675,492.

FIELD

This disclosure relates to a method and system to analyze price andproduct information.

BACKGROUND

The following description includes information that may be useful inunderstanding the present invention. It is not an admission that any ofthe information provided herein is prior art or relevant to thepresently claimed invention, or that any publication specifically orimplicitly referenced is prior art.

Search engines, such as Google, Bing, and others search and index vastquantities of information on the Internet. “Crawlers” (a.k.a. “spiders”)utilize URLs obtained from a “queue” to obtain content, usually from webpages. The crawlers or other software store and index some of thecontent. Users can then search the indexed content, view results, andfollow hyperlinks back to the original source or to the stored content(the stored content often being referred to as a “cache”). Computingresources to crawl and index, however, are not limitless. The URL queuesare commonly prioritized to direct crawler resources to web page serverswhich can accommodate the traffic, which do not block crawlers (such asaccording to “robots.txt” files commonly available from webpageservers), which experience greater traffic from users, and whichexperience more change in content.

Conventional search engines, however, are not focused on price andproduct information. If a price changes on a webpage, but the rest ofthe webpage remains the same, traditional crawlers (or the queuemanager) will not prioritize the webpage position in the queue,generally because the price is a tiny fraction of the overall contentand the change is not labeled as being significant; conversely, if thewebpage changes, but the price and/or product information remains thesame, the change in webpage content may cause a traditional crawler toprioritize the webpage position in the queue due to the overall changein content, notwithstanding that that price and product informationremained the same.

Conventional search engines, if presented with a query, will findcorresponding products. For example, it is possible to search for “men'sshoes” and to then be presented with a webpage comprising search resultsfor hundreds of thousands of webpages for men's shoes. The search resultmay further be narrowed by category of men's shoes, brand, and store.Search engines have been incorporated into online stores, wherein a usermay search for products, by keyword and/or by category and results canbe ordered by price.

Price history, however, is only narrowly viewed and, when it is, neverin the context of a rich attribute set which explores, in detail, whichattributes are associated with changes in price. Price histories are notmade available in real time, and do not allow intricate comparisonsbased on stores, merchants, brands, regions, time/date, and otherdimensions.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a network and device diagram illustrating exemplary computingdevices configured according to embodiments disclosed in this paper.

FIG. 2 is a functional block diagram of an exemplary Indix Server 200computing device and some data structures and/or components thereof.

FIG. 3 is a functional block diagram of the Indix Datastore 300illustrated in the computing device of FIG. 2.

FIG. 4 is a flowchart illustrating an embodiment of an Analytics Routine400.

FIG. 5 is a flowchart illustrating an embodiment of a Core Price Routine500.

FIG. 6 is a flowchart illustrating an embodiment of an Insights Routine600.

FIG. 7 is a flowchart illustrating an embodiment of a Volatility Routine700.

FIGS. 8A-8C are flowcharts illustrating embodiments of a SubstitutionRoutine 800.

FIG. 9 is a flowchart illustrating an embodiment of a Mix Routine 900.

FIG. 10 is a flowchart illustrating an embodiment of a PredictionRoutine 1000.

FIG. 11 is a flowchart illustrating an embodiment of a CompetitionRoutine 1100.

FIG. 12 is a flowchart illustrating an embodiment of a Promotion Routine1200.

FIG. 13 is a flowchart illustrating an embodiment of a LeadershipRoutine 1300.

FIG. 14 is a flowchart illustrating an embodiment of a Premium Routine1400.

FIG. 15 is a flowchart illustrating an embodiment of a Price RangeRoutine 1500.

FIG. 16 is a flowchart illustrating an embodiment of a Reach Routine1600.

FIG. 17 is a flowchart illustrating an embodiment of a User ContactRoutine 1700.

DETAILED DESCRIPTION

The following Detailed Description provides specific details for anunderstanding of various examples of the technology. One skilled in theart will understand that the technology may be practiced without many ofthese details. In some instances, structures and functions have not beenshown or described in detail or at all to avoid unnecessarily obscuringthe description of the examples of the technology. It is intended thatthe terminology used in the description presented below be interpretedin its broadest reasonable manner, even though it is being used inconjunction with a detailed description of certain examples of thetechnology. Although certain terms may be emphasized below, anyterminology intended to be interpreted in any restricted manner will beovertly and specifically defined as such in this Detailed Descriptionsection.

Unless the context clearly requires otherwise, throughout thedescription and the claims, the words “comprise,” “comprising,” and thelike are to be construed in an inclusive sense, as opposed to anexclusive or exhaustive sense; that is to say, in the sense of“including, but not limited to.” As used herein, the term “connected,”“coupled,” or any variant thereof means any connection or coupling,either direct or indirect between two or more elements; the coupling ofconnection between the elements can be physical, logical, or acombination thereof. Additionally, the words, “herein,” “above,”“below,” and words of similar import, when used in this application,shall refer to this application as a whole and not to particularportions of this application. When the context permits, words using thesingular may also include the plural while words using the plural mayalso include the singular. The word “or,” in reference to a list of twoor more items, covers all of the following interpretations of the word:any of the items in the list, all of the items in the list, and anycombination of one or more of the items in the list.

Certain elements appear in various of the Figures with the samecapitalized element text, but a different element number. When referredto herein with the capitalized element text but with no element number,these references should be understood to be largely equivalent and torefer to any of the elements with the same capitalized element text,though potentially with differences based on the computing device withinwhich the various embodiments of the element appears.

As used herein, a Uniform Resource Identifier (“URI”) is a string ofcharacters used to identify a resource on a computing device and/or anetwork, such as the Internet. Such identification enables interactionwith representations of the resource using specific protocols. “Schemes”specifying a syntax and associated protocols define each URI.

The generic syntax for URI schemes is defined in Request for Comments(“RFC”) memorandum 3986 published by the Internet Engineering Task Force(“IETF”). According to RFC 3986, a URI (including a URL) consists offour parts:

-   -   <scheme name>: <hierarchical part> [?<query>] [#<fragment>]

A URI begins with a scheme name that refers to a specification forassigning identifiers within that scheme. The scheme name consists of aletter followed by any combination of letters, digits, and the plus(“+”), period (“.”), or hyphen (“-”) characters; and is terminated by acolon (“:”).

The hierarchical portion of the URI is intended to hold identificationinformation that is hierarchical in nature. Often this part isdelineated with a double forward slash (“//”), followed by an optionalauthority part and an optional path.

The optional authority part holds an optional user information part (notshown) terminated with “@” (e.g. username:password@), a hostname (i.e.,domain name or IP address, here “example.com”), and an optional portnumber preceded by a colon “:”.

The path part is a sequence of one or more segments (conceptuallysimilar to directories, though not necessarily representing them)separated by a forward slash (“/”). If a URI includes an authority part,then the path part may be empty.

The optional query portion is delineated with a question mark andcontains additional identification information that is not necessarilyhierarchical in nature. Together, the path part and the query portionidentify a resource within the scope of the URI's scheme and authority.

The query string syntax is not generically defined, but is commonlyorganized as a sequence of zero or more <key>=<value> pairs separated bya semicolon or ampersand, for example:

-   -   key1=value1;key2=value2;key3=value3 (Semicolon), or    -   key1=value1&key2=value2&key3=value3 (Ampersand)

Much of the above information is taken from RFC 3986, which providesadditional information related to the syntax and structure of URIs. RFC3986 is hereby incorporated by reference, for all purposes.

As used herein, “Product” shall be understood to mean “products orservices.” References to “Product Attribute” herein shall be understoodto mean “product or service attribute.” As used herein, “Products” areassociated with iPIDs.

As used herein, an “iPID” or iPID 330 is a unique identifier assignedwithin the Indix System to a URI for a product, such as URI 305. TheiPID 330 may be, for example, a hash of URI 305. When multiple URIs 305from a common base domain name lead to webpages which, when parsed forPrice and Product Attributes and product the same Parse Result 325(notwithstanding that the webpages may contain other Content which doesnot contribute to the Parse Result 325) may be labeled as equivalent in,for example, the Equivalent iPID 334 record and may be treated as thesame iPID 330.

As used herein, a “Master iPID” or “MPID” or MPID 332 is an iPID 330assigned to a group of iPIDs 330 derived from URIs 305 which lead towebpages offering the same Product for sale. An MPID is generally meantto identify a single Product, generally produced by a commonmanufacturer, though the Product may be distributed and sold by multipleparties.

iPIDs and MPIDSs are associated with Price Attribute 340 records andProduct Attribute 345 records.

A Price Attribute 340 record may comprise one or more recordscomprising, for example, values which encode an iPRID which may be anidentifier for a price observed at a particular time, an iPID (discussedabove), a Product Name (a “Product Name” value in this record may alsobe referred to herein as a “Product”), a Standard Price, a Sale, aPrice, a Rebate amount, a Price Instructions record (containing specialinstructions relating to a price, such as that the price only applies tostudents), a Currency Type, a Date and Time Stamp, a Tax record, aShipping record (indicating costs relating to shipping to differentlocations, whether tax is calculated on shipping costs, etc.), a PriceValidity Start Date, a Price Validity End Date, a Quantity, a Unit ofMeasure Type, a Unit of Measure Value, a Merchant Name (with the name ofa merchant from whom the Product is available; a “Merchant Name” valuein this record may also be referred to herein as a “Merchant”), a StoreName (a Merchant may have multiple stores; a “Store Name” value in thisrecord may also be referred to herein as a “Store”), a User ID, a DataChannel (indicating the source of the Price Attribute 340 record, suchas an online crawl, a crowdsource, a licensed supplier of priceinformation, or from a merchant), a Source Details record (for example,indicating a URI, a newspaper advertisement), an Availability Flag, aPromotion Code, a Bundle Details record (indicating products which arepart of a bundle), a Condition Type record (indicating new, used, poor,good, and similar), a Social Rank record (indicating a rank of “likes”and similar of the price), a Votes/Likes record (indicating a number of“likes” and similar which a Price or Product has received), a Price Rankrecord, a Visibility Indicator record (indicating whether the price isvisible to the public, whether it is only visible to a Merchant, or thelike), a Supply Chain Reference record (indicating whether the price wasobtained from a retailer, a wholesaler, or another party in a supplychain), a Sale Location (indicating a geographic location where theproduct is available at the price), a Manufactured Location record(indicating where the product was produced or manufactured), a LaunchDate record (indicating how long the product has been on the market),and an Age of Product record (indicating how long the product was usedby the user). When capitalized herein, the foregoing terms (such asProduct, Price, Merchant, Store, Source Details, etc.) are meant torefer to values in a Price Attribute 340 record.

A Product Attribute 345 record may comprise, for example, valuesencoding features of or describing a Product. The entire ProductAttribute 345 schema may comprise thousands of columns, though only tensor hundreds of the columns may be applicable to any given Product. Anexample set of values in a Product Attribute 345 record for a ring is asfollows: Title, “Sterling Silver Diamond & Blue Topaz Ring;” Brand,“Blue Nile;” Category (such as, for example, a Category 335 in acategory schema), “rings;” Metal Name, “silver;” Stone Shape, “cushion;”Stone Name, “topaz;” Width, “3 mm;” Stone Color, “blue;” Product Type,“rings,” Birthstone, “September;” and Setting Type, “prong.” An exampleset of Product Attributes 345 for a shoe is as follows: Brand, “Asics;”Category (such as, for example, a Category 335 in a category schema ortaxonomy), “Men's Sneakers & Athletic;” Shoe Size, “8;” Product Type,“wrestling shoes,” Color, “black;” Shoe Style, “sneakers;” Sports,“athletic;” Upper Material, “mesh.” When capitalized herein, theforegoing terms (such as Brand, Category, Metal Name, Product Type,etc.) are meant to refer to values in a Product Attribute 345 record.

As used herein, “Content” comprises text, graphics, images (includingstill and video images), audio, graphical arrangement, and instructionsfor graphical arrangement, including HTML and CSS instructions whichmay, for example, be interpreted by browser applications.

As used herein, “Event” is information generally in news or currentevents. Events may be found in Content. Listing Pages, Product Pages,and Event Pages are all examples of Webpage Types 350.

As used herein, “PriceDNA” comprises a Product Attribute 345 record, oneor more Price Attribute 340 records, the output of the Core PriceRoutine 500 (generally found in the Core Price 380 records), and theoutput of the Insights 600 routine (generally found in the Insights 375records).

As used herein, a “Brand” is a family or group of Products sold by orunder a common trademark, such as the “Nike®” Brand, which sells underthis trademark a family of shoes, exercise equipment, and other apparel.Brand is a value within a Product Attribute 345 record.

As used herein, a “Store” is an online or physical sales venue. A Storeis a value within a Price Attribute 340 record.

As used herein, a “Merchant” is an operator of one or more Stores. AMerchant is a value in a Price Attribute 340 record.

Generally, an Analysis Routine 400 obtains Price Attribute 340 andProduct Attribute 345 records from the Indix Database 300 shortly afterthe records are produced following a crawl of webpages accessed via theURIs 305. The Analysis Routine 400 merges the records, performs a CorePrice Routine 500 to develop core price information, such as changes inprice, and exports the records and the result to the Core Price Routine500 to a sequential file which is indexed. The result of the Core PriceRoutine 500 may be searched and accessed by users in close to real-time.The Analysis Routine 400 also performs an Insight Routine 600. TheInsight Routine 600 comprises a set of sub-routines for derivingadditional information from the Price Attribute 340 and ProductAttribute 345 records and from the output of the Core Price Routine 500.Generally, the Insight Routine 600 identifies what Product Attributes345 and Price Attributes 340 across the datasets are associated with thechanges in price. The output of the Insight Routine 600 is also storedin the Indix Database 300 and may be searched and accessed by users,though the accessible values may be refreshed more slowly than the datafrom the Core Price Routine 500. A User Contact Routine 1700 allowsusers to search and obtain information and to set alerts relative to theinformation in the Indix Database 300.

FIG. 1 is a network and device diagram illustrating exemplary computingdevices configured according to embodiments disclosed in this paper.Illustrated in FIG. 1 are an Indix Server 200 and an Indix Database 300.The Indix Database 300 is discussed further in relation to FIG. 3.

Also illustrated in FIG. 1 is a Crawl Agent 400, representing CrawlAgents 1 to N, and a Crawl Agent Database 500. The Crawl Agent 400 andCrawl Agent Database 500 are used to crawl webpages accessed via theURIs 305.

Also illustrated in FIG. 1 is a Client Device 105, such as a mobile ornon-mobile computer device. The Client Device 105 is an example ofcomputing devices such as, for example, a mobile phone, a tablet,laptop, personal computer, gaming computer, or media playback computer.The Client Device 105 represents any computing device capable ofrendering Content in a browser or an equivalent user-interface. ClientDevices are used by “users.” The Client Device 105 may interact with theUser Contact Routine 1700.

Also illustrated in FIG. 1 is a Web Server 115, which may serve Contentin the form of webpages or equivalent output in response to URIs, suchas URI 305.

Also illustrated in FIG. 1 is an Ecommerce Platform 160, which mayprovide ecommerce services, such as website and/or webpage hosting viawebpage templates comprising HTML and CSS elements. Customers ofEcommerce Platform 160 may complete the webpage templates with Contentand serve the webpages and websites from, for example, Web Server 115.

Interaction among devices illustrated in FIG. 1 may be accomplished, forexample, through the use of credentials to authenticate and authorize amachine or user with respect to other machines.

In FIG. 1, the computing machines may be physically separate computingdevices or logically separate processes executed by a common computingdevice. Certain components are illustrated in FIG. 1 as connectingdirectly to one another (such as, for example, the Indix Database 300 tothe Indix Server 200), though the connections may be through the Network150. If these components are embodied in separate computers, thenadditional steps may be added to the disclosed invention to recitecommunicating between the components.

The Network 150 comprises computers, network connections among thecomputers, and software routines to enable communication between thecomputers over the network connections. Examples of the Network 150comprise an Ethernet network, the Internet, and/or a wireless network,such as a GSM, TDMA, CDMA, EDGE, HSPA, LTE or other network provided bya wireless service provider, or a television broadcast facility.Connection to the Network 150 may be via a Wi-Fi connection. More thanone network may be involved in a communication session between theillustrated devices. Connection to the Network 150 may require that thecomputers execute software routines which enable, for example, the sevenlayers of the OSI model of computer networking or equivalent in awireless phone network.

This paper may discuss a first computer as connecting to a secondcomputer (such as a Crawl Agent 400 connecting to the Indix Server 200)or to a corresponding datastore (such as to Indix Database 300); itshould be understood that such connections may be to, through, or viathe other of the two components (for example, a statement that acomputing device connects with or sends data to the Indix Server 200should be understood as saying that the computing device may connectwith or send data to the Indix Database 300). References herein to“database” should be understood as equivalent to “datastore.” Althoughillustrated as components integrated in one physical unit, the computersand databases may be provided by common (or separate) physical hardwareand common (or separate) logic processors and memory components. Thoughdiscussed as occurring within one computing device, the softwareroutines and data groups used by the software routines may be storedand/or executed remotely relative to any of the computers through, forexample, application virtualization.

FIG. 2 is a functional block diagram of an exemplary Indix Server 200computing device and some data structures and/or components thereof. TheIndix Server 200 in FIG. 2 comprises at least one Processing Unit 210,Indix Server Memory 250, a Display 240 and Input 245, all interconnectedalong with the Network Interface 230 via a Bus 220. The Processing Unit210 may comprise one or more general-purpose Central Processing Units(“CPU”) 212 as well as one or more special-purpose Graphics ProcessingUnits (“GPU”) 214. The components of the Processing Unit 210 may beutilized by the Operating System 255 for different functions required bythe routines executed by the Indix Server 200. The Network Interface 230may be utilized to form connections with the Network 150 or to formdevice-to-device connections with other computers. The Indix ServerMemory 250 generally comprises a random access memory (“RAM”), a readonly memory (“ROM”), and a permanent mass storage device, such as a diskdrive or SDRAM (synchronous dynamic random-access memory).

The Indix Server Memory 250 stores program code for software routines,such as, for example, Analysis Routine 400, Core Price Routine 500,Insight Routine 600, Volatility Routine 700, Substitution Routine 800,Mix Routine 900, Prediction Routine 1000, Competition Routine 1100,Promotion Routine 1200, Leadership Routine 1300, Premium Routine 1400,Price Range Routine 1500, Reach Routine 1600, and User Contact Routine1700 as well as, for example, browser, email client and server routines,client applications, and database applications (discussed furtherbelow). Additional data groups for routines, such as for a webserver andweb browser, may also be present on and executed by the Indix Server 200and the other computers illustrated in FIG. 1. Webserver and browserroutines may provide an interface for interaction among the computingdevices, for example, through webserver and web browser routines whichmay serve and respond to data and information in the form of webpagesand html documents or files. The browsers and webservers are meant toillustrate machine- and user-interface and user-interface enablingroutines generally, and may be replaced by equivalent routines forserving and rendering information to and in interfaces in a computingdevice (whether in a web browser or in, for example, a mobile deviceapplication).

In addition, the Indix Server Memory 250 also stores an Operating System255. These software components may be loaded from a non-transientComputer Readable Storage Medium 295 into Indix Server Memory 250 of thecomputing device using a drive mechanism (not shown) associated with anon-transient Computer Readable Storage Medium 295, such as a floppydisc, tape, DVD/CD-ROM drive, memory card, or other like storage medium.In some embodiments, software components may also or instead be loadedvia a mechanism other than a drive mechanism and Computer ReadableStorage Medium 295 (e.g., via Network Interface 230).

The computing device 200 may also comprise hardware supporting inputmodalities, Input 245, such as, for example, a touchscreen, a camera, akeyboard, a mouse, a trackball, a stylus, motion detectors, and amicrophone. The Input 245 may also serve as a Display 240, as in thecase of a touchscreen display which also serves as Input 245, and whichmay respond to input in the form of contact by a finger or stylus withthe surface of the Input 245.

The computing device 200 may also comprise or communicate via Bus 220with Indix Datastore 300, illustrated further in FIG. 3. In variousembodiments, Bus 220 may comprise a storage area network (“SAN”), a highspeed serial bus, and/or via other suitable communication technology. Insome embodiments, the Indix Server 200 may communicate with the IndixDatastore 300 via Network Interface 230. The Indix Server 200 may, insome embodiments, include many more components than those shown in thisFigure. However, it is not necessary that all of these generallyconventional components be shown in order to disclose an illustrativeembodiment.

FIG. 3 is a functional block diagram of the Indix Datastore 300illustrated in the computing device of FIG. 2. The components of theIndix Datastore 300 are data groups used by routines and are discussedfurther herein in the discussion of other of the Figures. The datagroups used by routines illustrated in FIG. 3 may be represented by acell in a column or a value separated from other values in a definedstructure in a digital document or file. Though referred to herein asindividual records or entries, the records may comprise more than onedatabase entry. The database entries may be, represent, or encodenumbers, numerical operators, binary values, logical values, text,string operators, joins, conditional logic, tests, and similar.

FIG. 4 is a flowchart illustrating an embodiment of an Analytics Routine400. The Analytic Routine 400 may be performed by, for example, theIndix Server 200. At box 405, the Analytic Routine 400 obtains a new setof Price Attribute 340 records and a new set of Product Attribute 345records, with an assigned MPID 332 and Category 335. This may occur asfrequently as URIs 305 are crawled, the webpages therefrom parsed intoParse Results 325 comprising new Price Attribute 340 and ProductAttribute 345 records with an assigned MPID 332 and Category 335.

At box 500, the Analytic Routine 400 performs the Core Price Routine 500(discussed further below). At box 410, for each iPID 330 associated witha Price Attribute 340 or Product Attribute 345 record in box 405, theAnalytic Routine 400 appends the then-current Price Attribute 340 record(of box 405) to a set of Price Attribute 340 records associated witheach iPID 330 (each iPID 330 may be associated with a set of PriceAttribute 340 records). At box 415, for each iPID 330 associated with aPrice Attribute 340 or Product Attribute 345 record in box 405, theAnalytic Routine 400 merges the then-current Product Attribute 345record (of box 405) into a Product Attribute 345 record associated witheach iPID 330 (each iPID 330 may be associated with one ProductAttribute 340 record). In this merger, new values overwrite old valuesunless the old record is longer or unless the old record otherwise isjudged to be of higher quality (such as if the old record uses fewerwords, but the words are less common than the words in the new record);if a new record does not have a value where an old value exists, the oldvalue may be left.

At box 420, the output of the Core Price Routine 500 and of boxes 410and 415 are output to a Sequential File 365 record, which SequentialFile 365 record is stored, for example, in the Indix Database 300 andwhich Sequential File 365 is indexed, for example, to allow the contentsof the Sequential File 365 to be searched and values in it accessed asit and the index are updated. Updates may occur, for example, inclose-to real-time, following crawl of a webpage and output of new PriceAttribute 340 and Product Attribute 345 records.

At box 600, the Analytic Routine 400 performs the Insight Routine 600utilizing and expanding upon the output of the Core Price Routine 500and the boxes, above. Generally, the Insight Routine 600 identifies whatProduct Attributes 345and Price Attributes 340 across the datasets areassociated with the changes in price. At box 435 the Analytic Routine400 stores the output of the Insight Routine 600 in the Indix Database300 as Insights 375. At box 1700, the Analytic Routine 400 performs theUser Contact Routine 1700. Utilizing the User Contact Routine 1700,users may query the Indix Database 300 and set alerts.

FIG. 5 is a flowchart illustrating an embodiment of a Core Price Routine500. Boxes 505 through 540 may iterate for each new Price Attribute 340record associated with an iPID 330. At box 510 all Price Attribute 340records associated with the iPID 330, including the new Price Attribute340 record of box 405 and historic records (and/or summary valuesderived therefrom), may be obtained. At box 515, the high, low, average,mean, magnitude, and number of price values over several time periodsfor the iPID 330 may be calculated. A default time period may be 45 or30 days, though these values may be calculated for several time periods.The output may be saved, for example, to the Core Price 380 records, andindexed.

At box 520, an MPID 332 associated with the iPID 330 may be obtained. Atbox 525, the high, low, average, mean, magnitude, and number of pricevalues over several time periods may be calculated for the MPID 332utilizing the new value associated with the iPID 330 from box 515. TheiPID 330 may be a hash of a URI 305 and the result of box 515 is thuslimited to a particular sales channel (typically a Store or Merchant)for a particular Product (taking into account that duplicate iPIDs 330from a base domain name may be treated as equivalent); the MPID 332 isassigned to all iPIDS 330 which represent the same Product, so the MPIDversion of this calculation in box 525 returns values relating to theProduct across Stores, Merchants, Locations, etc. The calculation of box525 may return values which are or may be sorted by, for example, Store,Merchant, Location (such as Region), and by time periods such as aSeason. The output may be saved, for example, to the Core Price 380records, and indexed.

At box 535, all calculations and other routines which utilize the valuesfor the iPID 330 and the associated MPID 332 may insert the new valuescalculated for the iPID 330 and the MPID 332 and may recalculate thevalues. For example, the high, low, average, mean, magnitude, and numberof price changes over time periods by Category 335, such as a Category335 associated with the iPID 330, may be calculated. The output may besaved, for example, to the Core Price 380 records, and indexed.

Calculations or other routines which utilize the values calculated inFIG. 5 may refer to data addresses. The Core Price Routine 500 mayupdate the values stored at these data addresses, which causes thecalculations or other routines to update their output, when suchcalculations or other routines are (re)executed, such on a schedule oron the occurrence of an event.

At box 599, the Core Price Routine 500 may return, for example, to theAnalysis Routine 400.

FIG. 6 is a flowchart illustrating an embodiment of an Insights Routine600.

The Insights Routine 700 may perform one or more of a set ofsub-routines. At box 700, a Volatility Routine 700 may be performed todetermine the volatility of prices relative to the many dimensionsavailable in the PriceDNA. At box 800, a Substitution Routine 800determines substitutes for an iPID 330, MPID 332, or Category 335. Atbox 900, a Mix Routine 900 determines “how many” relative to the manydimensions available in the PriceDNA. At box 1000, a Prediction Routine1000 makes price predictions relative to the many dimensions availablein the PriceDNA. At box 1100, a Competition Routine 1100 determinescompetitors relative to a Product, Store, or Brand. At box 1200, aPromotion Routine 1200 determines promotions relative to Products,Stores, Brand, Seasons, and other dimensions available in the PriceDNA.At box 1300, a Leadership Routine 1300 determines which Products lead orfollow others in terms of price changes. At box 1400, a Premium Routine1400 determines which Products in a Category 335 charge higher (premium)prices. At box 1500, a Price Range Routine 1500 determines the number ofprice ranges and maximum and minimum for iPIDs, MPIDs, and categories.At box 1600, a Reach Routine 1600 determines the reach of an iPID orMPID in terms of the number of people who visit a sales venue.

FIG. 7 is a flowchart illustrating an embodiment of a Volatility Routine700. At box 705, the Prices associated with an iPID 330 over a timeperiod, such as 30 days, may be obtained, such as from the Core Price380 records. At box 710, the number of price changes within the timeperiod may be determined (if this was not already a value in the CorePrice 380 records). At box 715, the number of price changes within thetime period (“VBF”) may be determined relative to, for example, the iPID330, relative to an MPID 332 associated with the iPID 330, relative to aBrand, relative to a Region, relative to a Price Band by MPID 332,relative to a Category 335, and relative to all iPIDs 330 associatedwith a Merchant. The values may be saved and indexed to accelerateaccess to and/or enable searching for the values and/or the values maybe calculated on an as-needed basis. The values may be saved to theInsights 375 records.

At box 720, the benchmark number of Price changes in the period of timemay be determined. The benchmark may be, for example, the VBF relativeto additional criteria, such as, for example, the VBF for a Product (orMPID), plus 1, divided by the maximum VBF of other Products in the sameCategory as the Product, multiplied by 100 over 101. The benchmark VBFfor a Category may be determined by the VBF for the Category, plus 1,divided by the maximum VBF of the Category, multiplied by 100 over 101.The benchmark VBF for a Merchant may be the VBF of the Merchant, plus 1,divided by the maximum VBF of the Merchant, multiplied by 100 over 101.The benchmark VBF for a Brand may be the VBF of the Brand, plus 1,divided by the maximum VBF of the Brand, multiplied by 100 over 101. Thevalues may be saved to the Insights 375 records.

FIGS. 8A-8C are flowcharts illustrating embodiments of a SubstitutionRoutine 800. In a first example of an embodiment of a SubstitutionRoutine 800 illustrated in FIG. 8A, substitute Products within aCategory 335 are identified. At box 801, which, like other steps may beoptional, a Product may be identified by, for example, a user or aroutine, and the MPID 332 corresponding thereto may be obtained. At box805, a Category 335 may be obtained, whether corresponding to theProduct and MPID of step 801 or via a user query or other input, and allMPIDs 332 within the Category 335 may be obtained. At box 810 a PriceBand may be obtained or calculated relative to the Category 335 (such asfrom or according to the Price Range Routine 1500); the Price Band maybe selected by a user. Boxes 815 through 830 may iterate for each iPID330 within the Category of box 805.

At box 820, the iPIDs 330 in the Category of box 805 and with a Pricevalue within the Price Band of box 810 are identified, such as from theCore Price 380 records. At box 825, the result of box 820 may besubdivided, grouped, or filtered by Region, Time, Used/New, andaccording to other dimensions available in the PriceDNA. At box 830 theSubstitution Routine 800 may iterate over the remaining iPIDs 330 in theCategory 335. At box 835, the results may be saved as Substitutes, suchas to the Insights 375 records. At box 839, the process may return.

In a second example of an embodiment of a Substitution Routine 800illustrated in FIG. 8B, substitute Products within a Category 335 with apercentage overlap in Attributes 340/345 and within a Price Band areidentified. At box 840, a Category 335 may be obtained, whethercorresponding to a Product or via a user query or other input, and allMPIDs 332 within the Category 335 may be obtained. At box 845, theProduct Attributes 345 of all iPIDS 330 within the MPIDs 332 may beobtained. At box 850, the Product Attributes 345 may be clustered toidentify the iPIDs 330 with at least a 50% Product Attribute 345 matchor overlap. At box 855 a Price Band may be obtained or calculatedrelative to the Category 335 (such as from or according to the PriceRange Routine 1500); the Price Band may be selected by a user.

Boxes 860 through 870 may iterate for each iPID 330 within the MPIDs 332and Attribute 345 match of box 850. At box 865, the iPIDs 330 with aPrice value within the Price Band of box 855 and with the ProductAttribute 345 match or overlap of box 850 are identified. The result ofbox 865 may be subdivided or grouped further by sub-Price Ranges. At box870 the Substitution Routine 800 may iterate over the remaining iPIDs330 in the MPIDs 332 within the Category 335. At box 871, the resultsmay be saved as Substitutes in the Insights 375 records. At box 874, theprocess may return.

In a third example of an embodiment of a Substitution Routine 800illustrated in FIG. 8C, substitute Products within a Category 335 with apercentage overlap in Attributes 340/345 and in the top or bottom of aPrice Range are identified. At box 875, a Category 335 may be obtained,whether corresponding to a Product or via a user query or other input,and all MPIDs 332 within the Category 335 may be obtained. At box 880,the Product Attributes 345 of all iPIDS 330 within the MPIDs 332 may beobtained. At box 885, the Product Attributes 345 may be clustered toidentify the iPIDs 330 with at least a 50% Product Attribute 345 matchor overlap.

Boxes 890 through 897 may iterate for each iPID 330 within the MPIDs 332and Attribute 345 match of box 885. At box 895, the iPIDs 330 with theProduct Attribute 345 match or overlap of box 885 and in the bottom of aPrice Range or Price Band relative to the starting iPID 330 areidentified. At box 896 the top or bottom five (or another subset) of box895 may be selected. At box 897 this embodiment of the SubstitutionRoutine 800 may iterate over the remaining iPIDs 330 in the MPIDs 332within the Category 335. At box 898, the results may be saved asSubstitutes in the Insights 375 records. At box 899, the process mayreturn.

FIG. 9 is a flowchart illustrating an embodiment of a Mix Routine 900.The Mix Routine 900 determines “how many” relative to the manydimensions available in the PriceDNA. At block 905, the Mix Routine 900obtains a first segmentation criteria, such as, for example, a ProductName, Brand, or Category. At block 910, a first sub-segmentationcriteria may be obtained, such as, for example, a Store, Location, orPrice Band. At block 915, a second sub-segmentation criteria may beobtained, such as, for example, a Store, Location, or Price Band. Atblock 920, the number of Products, such as by MPID 332, which meet thecriteria of blocks 905, 910, and 915 may be counted. At block 925, theresult of block 920 may be subdivided or grouped by Location, Time,Season, Price Band, Used/New or other dimensions available in thePriceDNA. At block 930, the results of blocks 920 and/or 925 may besaved as Mix values in the Insights 375 records. At block 999, theprocess may return.

FIG. 10 is a flowchart illustrating an embodiment of a PredictionRoutine 1000. The Prediction Routine 1000 makes price predictionsrelative to the many dimensions available in the PriceDNA. At block1005, the Prediction Routine 1000 obtains a Product and obtains oridentifies an MPID 332 and/or iPIDs 330 associated therewith. At block1010, the last Price of the Product by MPID 332 and/or iPID 330 may beobtained, such as from the Core Price 380 records. At block 1015, firstand second linear regression parameters may be calculated or obtained.

At block 1020, to the first parameter may be added the second parametermultiplied by the last price of the Product from block 1010. At block1025 an error term may be added to the result of block 1020. At block1030 a confidence interval may be calculated. At block 1035 the resultmay be saved as Predictions in the Insights 375 records. At block 1035the Prediction Routine 1000 may then return.

In FIG. 10, the predicted Price for a product may be determinedaccording to the following equation: p_(t=α+βp) _((t-1))+ε, where p_(t)is the price at time t, α and β are the parameters of the linearregression and ε is the error term and is assumed to be Normallydistributed. Confidence, C, is a measure that represents the chance formaking 0.01% error in predicting the price of the product,C=normsdist(Z)and

$Z = {\frac{{.01}\%*{Price}}{\left( {{Std}.{Error}} \right)}.}$

In this formula, the parameters of the model are estimated using theoriginal least squares method as follows:

$\hat{\beta} = {{\frac{\left( {{\Sigma \; p_{({t - 1})}p_{t}} + {\frac{1}{n}\Sigma \; p_{({t - 1})}\Sigma \; p_{t}}} \right)}{\left( {{\Sigma \; p_{({t - 1})}^{2}} - {\frac{1}{n}\left( {\Sigma \; p_{({t - 1})}} \right)^{2}}} \right)}\mspace{14mu} {and}\mspace{14mu} \hat{\alpha}} = {{\overset{\_}{p}}_{t} - {\hat{\beta}{\overset{\_}{p}}_{({t - 1})}}}}$

FIG. 11 is a flowchart illustrating an embodiment of a CompetitionRoutine 1100. The Competition Routine 1100 determines competitorsrelative to a Stores, Brands, or Merchants. At box 1105, a first andsecond (or more) Store, Brand, or Merchant may be obtained, along withan optional Category 335. These may be obtained from a user or anotherroutine. At box 1110, all Products sold by or under each of the entitiesof box 1105 may be obtained, such as from the PriceDNA. The Products mayoptionally be filtered by the Category of box 1105.

At box 1115, a determination may be made regarding whether or not theentities of box 1105 have 70% or more overlapping Products, per theProducts of box 1110. The affirmative output of this box may be saved asCompetitors in the Insights 375 records.

At box 1120, the Competitors may be filtered by, for example, on or moreof Store, Substitute, Substitute by Price Band, Brand, Location(including Region), Time (including Season), and whether the Productsare sold as used or new. Which criteria are used in the filter may bedetermined by input from a user. The output of box 1120 may be saved inthe Insights 375 records.

At box 1125, the average price of Products in the Category 335 of box1105 may be obtained relative to, for example, the Category 335,Substitute, Substitute by Price Band, Brand, Location, Time, used/newstatus, and other criteria. At box 1130, the output of box 1125 may beranked and saved as Price Competitiveness in the Insights 375 records.

At box 1135, a Store and Location for a target Product may be obtained,such as from a user. At box 1145, the Competitors from box 1115 may beobtained or determined and the Competitors filtered to select onlyCompetitors with sales in the Location of box 1135. At box 1145, Storesin the Location which are the same as the Store of box 1135 may beremoved from the set of Competitors, leaving the remainder (those notremoved).

At box 1150, the output of box 1150 may be placed in a Voroni Diagram orsimilar data structure, with the location in the Vononi Diagram beingbased on physical location of the Stores of the Competitors. Generally,a Voroni Diagram determines the distance between objects in a geometricmanner, rather than a power-law manner. At box 1155, the distancebetween the target Store and each Competitor may be ranked. At box 1160,the output of box 1160 may be saved as Reach Competitiveness in theInsights 375 records.

FIG. 12 is a flowchart illustrating an embodiment of a Promotion Routine1200. The Promotion Routine 1200 determines promotions relative toProducts, Stores, Brand, Seasons, and other dimensions available in thePriceDNA. At box 1205, a Product may be obtained, such as from userinput, and the MPID 332 and/or an IPID 330 corresponding to the Productmay be identified in the Attributes 340/345 (via, for example, theSequential File 365). The Product may be a single Product or a Bundlecomprising multiple Products. At box 1210, a “Promotion” value may beidentified in the Attributes 340/345 associated with the MPID 332 and/orIPID 330; the “Promotion” value may be a Sale Price and/or a PromotionCode in the Price Attribute 340 records associated with the MPID 332and/or IPID 330. Alternatively, at box 1210 the Price history for theMPID 332 and/or IPID 330 may be graphed.

At box 1215, the number, length, date/time, and magnitude of thePromotions may be determined and saved as Promotions in the Insights 375records. Alternatively, the number, length, date/time, and magnitude ofthe low-points in the graph of box 1210 may be determined and saved asPromotions in the Insights 375 records. At box 1220, the output of box1215 may be filtered by criteria such as, for example, date/time, PriceBand, Location (including Region), Season, and Holidays. The criteriamay be received from, for example, a user and/or a default set ofcriteria may be applied, with the result of each being saved in theInsights 375 records.

At box 1225 a time period and a Merchant may be obtained, such as from auser; the Merchant may be associated with the Product of box 1205. Atbox 1225, the number of Products sold by the Merchant in Promotionduring the time period may be determined.

At box 1230, the result of box 1215 may be benchmarked relative toaverage Promotion times, durations, and magnitude for other Products(including other Bundles of the Product), the timing of Promotions forother Products, relative to the magnitude of Promotions for otherProducts, relative to the Products associated with a Brand, relative toall Products sold at a Store, relative to Products in a Price Band, andrelative to Competitors and Substitutes. The result may be saved in theInsights 375 records.

FIG. 13 is a flowchart illustrating an embodiment of a LeadershipRoutine 1300. The Leadership Routine 1300 determines which Products leador follow others in terms of price changes. At box 1305, a Product maybe obtained, for example, from a user or another routine, and theassociated MPID 332 determined. At box 1310 Substitutes for the Productmay be obtained (such as from or by the Substitutes 800 routine). At box1315, the change in Price, or Price delta, for the Product and theSubstitutes may be determined over periods of time. The Price delta maybe determined in an absolute sense (whether the change was positive ornegative) and/or with a determination of the magnitude of the Pricedelta.

At box 1320, the Price deltas determined at box 1315 may be matched, todetermine if any of the Price deltas with the same absolute value(positive or negative) occurred within a time window of one another(deltas beyond the time window may not be considered to be correlated),with the result being saved as a Leader/Follower indication in theInsights 375 records.

At box 1325, the matching Price deltas of box 1320 may be graphedaccording to time. At box 1330, the result of box 1325 may be filteredby criteria such as Region, Rime, Date/Time, Season, Price Band, andStore.

At box 1335, the number of Leaders and Followers may be determinedrelative to a time period. At box 1340, the average lead/follow time maybe determined. At box 1345, leaders/followers with respect to exactProduct matches (for different Stores selling the same Product,determined at box 1330) may be identified. At box 1350, the results maybe benchmarked relative to the number of leaders/followers and othercriteria. The result of various of the boxes in FIG. 13 may be saved inthe Insights 375 records. At box 1399, the Leadership Routine 1300 mayreturn.

FIG. 14 is a flowchart illustrating an embodiment of a Premium Routine1400. The Premium Routine 1400 determines which Products (generally, byMPID) in a Category 335 charge higher Prices (premium). At box 1405, aProduct may be received, such as from input by a user or anotherroutine. At box 1410, the Substitutes for the Product may be determinedor obtained from another routine, such as the Substitution 800 routineand/or the Insights 375 records. At box 1415, the Prices of the Productand of the Substitutes may be obtained, such as from the Core Price 380records. At box 1420, the obtained Prices of box 1415 may be graphed ormapped and the top of the Price distribution identified. The top of thePrice distribution may be the top five or ten percent or the top fiveProducts or Substitutes may be identified and saved as the “Premium”Products in the Insights 375 records.

At box 1425, the Product Attributes 345 of the Products and Substitutesof box 1410 may be obtained and clustered by similarity. At box 1430,the Product Attributes 345 unique to or dominant in the PremiumProducts, determined by the clusters of box 1425, may be identified andsaved in the Insights 375 records.

At box 1435, user votes regarding Product Attributes 345 of PremiumProducts may be received. At box 1440, the user votes may be talliedand, at box 1445, the “winning” Product Attributes 345 (with the mostvotes) may be set as the Product Attributes 345 associated with thePremium Products in the Insights 375 records.

FIG. 15 is a flowchart illustrating an embodiment of a Price RangeRoutine 1500. The Price Range Routine 1500 determines the number ofprice ranges and maximum and minimum for iPIDs, MPIDs, and categories.At box 1505, a Product may be obtained, such as from a user or anotherroutine. At box 1510, the Prices for the Product may be obtained, suchas from the PriceDNA for the Product. At box 1515, the Prices of box1510 may be clustered by similarity and with a minimum cluster size,with the range in Price across each cluster being saved as Price Rangesfor the Product in the Insights 375 records.

At box 1520, the Channel Range for the Product may be set as the minimumand maximum of the Prices of box 1510 and saved in the Insights 375records. At box 1525, the results of boxes 1510, 1515, and 1520 may befiltered by, for example, Region, Date/Time, and according to othercriteria and saved in the Insights 375 records. At box 1530, the PriceRanges may be determined relative to all Products in a Category 335, allProducts by a Brand, and relative to a benchmark which may be, forexample, the maximum number of Price Ranges within a Category 335. Theresult thereof may be saved as Price Ranges in the Insights 375 records.

FIG. 16 is a flowchart illustrating an embodiment of a Reach Routine1600. The Reach Routine 1600 determines the reach of an iPID or MPID interms of the number of people who visit a sales venue. At box 1605, aProduct may be obtained, such as from a user or another routine. At box1610, the Stores offering the Product for sale may be obtained. At box1615, the traffic at the stores may be obtained, such as from a sourcefor online webpage/website traffic, such as Alexa or similar. At box1620, the result of box 1615 may be filtered by, for example, criteriasuch as Date/Time (including Season), Location (including Region),Holiday, and other criteria. The result thereof may be saved as Reach inthe Insights 375 records. At box 1699, the Reach Routine 1600 mayreturn.

FIG. 17 is a flowchart illustrating an embodiment of a User ContactRoutine 1700. At box 1705, a user contact with the User Contact Routine1700 may be detected. The user contact may be part of a user-interfaceserved by the User Contact Routine 1700. At box 1710, a user query maybe received, such as for PriceDNA records and/or Insight records. At box1715, the user query may be executed relative to the Index 370 and theSequential File 365. At box 1720, a determination may be made regardingwhether the user has requested that the query be stored as an alert. Ifso, then at box 1725 a time period for the alert may be obtained or set(such as according to a default time period, such as once per day orweek). At box 1730, on occurrence of the time period of box 1725, thequery may be executed relative to the Index 370 and the Sequential File365. At box 1735, an alert or other message may be sent to contactinformation associated with the user. At box 1799, the User ContactRoutine 1700 may conclude.

The above Detailed Description of embodiments is not intended to beexhaustive or to limit the disclosure to the precise form disclosedabove. While specific embodiments of, and examples are described abovefor illustrative purposes, various equivalent modifications are possiblewithin the scope of the system, as those skilled in the art willrecognize. For example, while processes or blocks are presented in agiven order, alternative embodiments may perform routines havingoperations, or employ systems having blocks, in a different order, andsome processes or blocks may be deleted, moved, added, subdivided,combined, and/or modified. While processes or blocks are at times shownas being performed in series, these processes or blocks may instead beperformed in parallel, or may be performed at different times. Further,any specific numbers noted herein are only examples; alternativeimplementations may employ differing values or ranges.

1. A computer implemented method of processing information fromwebpages, the method comprising: receiving a first and a second set ofprice and product attributes for a first product, which attributescomprise: a first identifier of a first identifier-type derived from aURI which links to a webpage offering the product for sale, a secondidentifier of a second identifier-type assigned to all instances of theproduct as offered for sale at any URI, and a first category in acategory taxonomy; performing a first URI-specific price analysis ofprice values in the first and second sets of price and productattributes to identify changes in price for the first product andassociating the result with the first identifier of the firstidentifier-type and saving the result as a first URI-specific core priceresult; receiving a third and fourth set of price and product attributesfor a second product, which attributes comprise: a third identifier ofthe first identifier-type, a fourth identifier of the secondidentifier-type, and a second category in the category taxonomy;performing a second URI-specific price analysis of price values in thethird and fourth sets of price and product attributes to identifychanges in price for the second product and associating the result withthe third identifier of the first identifier-type and saving the resultas a second URI-specific core price result; when the second identifierand the fourth identifiers are the same, performing a firstnon-URI-specific price analysis utilizing the first and secondURI-specific core price results to identify changes in price accordingto the second identifier-type and saving the result as a firstnon-URI-specific core price result; saving and indexing the output ofthe URI-specific and non-URI-specific price analyses in a first filestructure and making the first file structure available to be searchedsubstantially as the sets of product and price attributes are received;performing a meta-analysis utilizing the URI-specific andnon-URI-specific core price results to identify what product and priceattributes across the datasets are associated with the changes in price;and saving and indexing the output of the meta-analysis as a second filestructure and making the second file structure available to be searched.2. The method of claim 1, further comprising merging new productattribute records into prior product attribute records and saving newprice attribute records along with prior price attribute records.
 3. Themethod of claim 1, wherein the URI-specific price analysis comprisesdetermining the high, low, average, mean, magnitude and number of pricechanges over at least one time period for the price and productattributes associated with the same identifier of the firstidentifier-type.
 4. The method of claim 1, wherein the non-URI-specificprice analysis comprises determining the high, low, average, mean,magnitude and number of price changes over at least one time period forthe price and product attributes associated the same identifier of thesecond identifier-type.
 5. The method of claim 4, wherein at least oneof the first and second identifier-types are further associated with atleast one of a store, a merchant, and a location and wherein thenon-URI-specific price analysis produces results associated therewith.6. The method of claim 1, wherein the price attributes comprise at leastone of a time, a product name, a price, a quantity, a unit ofmeasurement, a merchant name, a store name, a bundle detail, and alocation.
 7. The method of claim 1, wherein the product attributescomprise at least one of a title, a brand, a category in the categorytaxonomy, a color, a product type, and a size.
 8. The method of claim 1,further comprising receiving a user query and executing the queryrelative to the first and/or second file structures.
 9. The method ofclaim 1, further comprising receiving a user query, a schedule forexecuting the query, executing the query at the scheduled time on thefirst and/or second file structures, and alerting the user regarding theresult of the query.
 10. The method of claim 1, wherein the first andsecond file structures may be searched by at least one of the firstidentifier-type, the second identifier-types, or a category in thecategory taxonomy.
 11. The method of claim 1, wherein the first andsecond categories are the same.
 12. The method of claim 1, wherein themeta-analysis determines the volatility of price changes over time foreach of the first and second products.
 13. The method of claim 12,wherein the volatility is determined by counting the number of pricechanges in a time period according to at least one of the firstidentifier-type, the second identifier-type, a brand, a region, a priceband, and a category in the category taxonomy.
 14. The method of claim1, wherein the meta-analysis determines whether one of the products is asubstitute for the other.
 15. The method of claim 14, wherein whetherone of the products is a substitute for the other is determined bydetermining if the first and second products are in the same category inthe category taxonomy and by determining whether the first and secondproducts are within a price band within the category.
 16. The method ofclaim 15, further comprising determining if the first and secondproducts share at least fifty-percent of the same product attributes.17. The method of claim 1, wherein the meta-analysis determinespredictions regarding the future prices for the products.
 18. The methodof claim 17, wherein the predictions are determined by obtaining thelast price of at least one of the products from the URI-specific coreprice associated therewth, calculating or obtaining first and secondlinear regression parameters, multiplying the second linear regressionparameter by the last price and adding this to the first linearregression parameter.
 19. The method of claim 1, wherein the price andproduct attributes comprise at least one of a store, merchant, or brandand the meta-analysis determines products associated therewith andcompetitors thereof.
 20. The method of claim 1, wherein themeta-analysis determines whether a price change for the first productleads or follows a price change for the second product.
 21. The methodof claim 1, wherein the meta-analysis determines whether the first orsecond product is a premium product relative to the other.
 22. Themethod of claim 1, wherein the meta-analysis determines the price rangesin which the products are offered for sale.
 23. A webpage informationprocessing computing apparatus, the apparatus comprising a processor anda memory storing instructions that, when executed by the processor,configure the apparatus to: receive a first and a second set of priceand product attributes for a first product, which attributes comprise: afirst identifier of a first identifier-type derived from a URI whichlinks to a webpage offering the product for sale, a second identifier ofa second identifier-type assigned to all instances of the product asoffered for sale at any URI, and a first category in a categorytaxonomy; perform a first URI-specific price analysis of price values inthe first and second sets of price and product attributes to identifychanges in price for the first product and associating the result withthe first identifier of the first identifier-type and save the result asa first URI-specific core price result; receive a third and fourth setof price and product attributes for a second product, which attributescomprise: a third identifier of the first identifier-type, a fourthidentifier of the second identifier-type, and a second category in thecategory taxonomy; perform a second URI-specific price analysis of pricevalues in the third and fourth sets of price and product attributes toidentify changes in price for the second product and associate theresult with the third identifier of the first identifier-type and savethe result as a second URI-specific core price result; when the secondidentifier and the fourth identifiers are the same, perform a firstnon-URI-specific price analysis utilizing the first and secondURI-specific core price results to identify changes in price accordingto the second identifier-type and save the result as a firstnon-URI-specific core price result; save and index the output of theURI-specific and non-URI-specific price analyses in a first filestructure and make the first file structure available to be searchedsubstantially as the sets of product and price attributes are received;perform a meta-analysis utilizing the URI-specific and non-URI-specificcore price results to identify what price and product attributes acrossthe datasets are associated with the changes in price; and save andindex the output of the meta-analysis as a second file structure andmake the second file structure available to be searched.
 24. Anon-transient computer-readable storage medium having stored thereoninstructions that, when executed by a processor, configure the processorto: receive a first and a second set of price and product attributes fora first product, which attributes comprise: a first identifier of afirst identifier-type derived from a URI which links to a webpageoffering the product for sale, a second identifier of a secondidentifier-type assigned to all instances of the product as offered forsale at any URI, and a first category in a category taxonomy; perform afirst URI-specific price analysis of price values in the first andsecond sets of price and product attributes to identify changes in pricefor the first product and associating the result with the firstidentifier of the first identifier-type and save the result as a firstURI-specific core price result; receive a third and fourth set of priceand product attributes for a second product, which attributes comprise:a third identifier of the first identifier-type, a fourth identifier ofthe second identifier-type, and a second category in the categorytaxonomy; perform a second URI-specific price analysis of price valuesin the third and fourth sets of price and product attributes to identifychanges in price for the second product and associate the result withthe third identifier of the first identifier-type and save the result asa second URI-specific core price result; when the second identifier andthe fourth identifiers are the same, perform a first non-URI-specificprice analysis utilizing the first and second URI-specific core priceresults to identify changes in price according to the secondidentifier-type and save the result as a first non-URI-specific coreprice result; save and index the output of the URI-specific andnon-URI-specific price analyses in a first file structure and make thefirst file structure available to be searched substantially as the setsof product and price attributes are received; perform a meta-analysisutilizing the URI-specific and non-URI-specific core price results toidentify what price and product attributes across the datasets areassociated with the changes in price; and save and index the output ofthe meta-analysis as a second file structure and make the second filestructure available to be searched.